Digitally aware neural dictation interface

ABSTRACT

Systems and methods for populating the elements of content are disclosed. One method includes determining a plurality of elements of a document and receiving a first speech input from a user to enable a mode of operation. The method further includes authenticating the user by comparing the first speech input with at least one voice sample of the user and enabling the mode of operation. The method further includes receiving, in the mode of operation, a second speech input for filling out a first element of the document and determining an irregularity or distortion in the second speech input based on the first element and identifying a missing syllable or a distorted syllable. The method further includes refining the second speech input into at least one matching syllable, converting the refined second speech input, and providing the text to populate the first element with the text.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/600,656, titled “Digitally Aware Neural Dictation Interface,” filedOct. 22, 2019, which is a continuation of U.S. patent application Ser.No. 16/600,242, titled “Digitally Aware Neural Dictation Interface,”filed Oct. 11, 2019, all of which are incorporated herein by referencein their entireties and for all purposes.

TECHNICAL FIELD

Embodiments of the present disclosure relate generally to the field ofhands-free input modalities and, in particular, to allowing a user of adevice to populate a plurality of fields of a form displayed on thedevice using a voice input.

BACKGROUND

Traditionally, filling or populating an electronic form using anelectronic device (e.g., laptop, smart phone, etc.) required users tomanually type in the values of the fields of the form using a keyboard.To improve efficiency and save time, users may receive help filling outthe form through software that pre-fills or auto-completes certainfields of the form (e.g., name, home address, etc.). Further, users withimpaired eyesight may use screen readers that read aloud text thatappears on the display to help them fill out the form. But, such screenreaders lack the ability to recognize specific fields of a form.Therefore, improved systems that help users fill out electronic formsfaster and more efficiently are desired.

SUMMARY

A first example embodiment relates to a user device configured to enablea conversational electronic form that enables a user to speak in aconversational-like manner to fill out an electronic form. The userdevice includes a processing circuit comprising one or more processorscoupled to non-transitory memory. The processing circuit is structuredto: receive, by a microphone of the user device, a speech input from theuser corresponding to a value of a current field of a plurality offields of an electronic form provided on a display screen of the userdevice; convert the speech input into the value for the current field;display, on a display screen of the user device, the value in thecurrent field for visual verification by the user; prompt, by thespeaker of the user device, the user for information corresponding to avalue of a next field of the plurality of fields in response todetermining that the current field is populated with the correspondingvalue; and prompt, by the speaker of the user device, the user to submitthe form in response to determining that the electronic form is completebased on the populated fields of the electronic form. Beneficially, bymoving field-to-field based on a verbal input and output (e.g., a promptfor specific information regarding the next field and a user's voiceinput in response to the prompt), a conversational electronic form isprovided that may be appealing and easy-to-use for users.

Another example embodiment relates to a method for providing aconversational electronic form. The method includes receiving a speechinput from a user corresponding to a first field of a plurality offields of an electronic form provided on a display screen of a userdevice; converting the speech input from an audible value into text;displaying, on the display screen of the user device, the text in thefirst field of the electronic form to allow a visual verification by theuser; prompting, via a speaker of the user device, the user forinformation for a subsequent field in the plurality of fields upon eachpreceding field being populated with text from converted speech inputs;determining the form is complete and ready for submission based on a setof fields being populated with text in the plurality of fields; andenabling a submission of the completed form.

Still another example embodiment relates to a method. The methodincludes enabling at least a partial hands-free mode of operation of auser device; determining a characteristic of an electronic form providedon a display screen of the user device based on metadata associated withthe electronic form; identifying and navigating to a first field of aplurality fields of the electronic form based on the metadata;prompting, via a speaker of the user device, the user for informationfor the first field and a subsequent field in the plurality of fieldsupon each preceding field being populated with text from a speech inputassociated with each field; and enabling a submission of the electronicform based on a received vocal command.

Yet another example embodiment relates to a method for providing agraphical representation via a speech input. The method includes:receiving, by a processing circuit of a user device, a speech input froma user selecting an option from a drop down menu of an electronic form;receiving, by the processing circuit, a speech input from the userselecting an option from the drop down menu; and displaying, on adisplay screen of the user device, a graphical representationcorresponding to the selected option from the drop down menu of theelectronic form.

BRIEF DESCRIPTION OF THE FIGURES

These and other features, together with the organization and manner ofoperation thereof, will become apparent from the following detaileddescription when taken in conjunction with the accompanying drawings.

FIG. 1 is a block diagram of a system for providing a hands-free mode ofoperation of a user device by a user to populate a plurality of fieldsof a form using the user device, according to an example embodiment.

FIG. 2 is a block diagram of the user device of FIG. 1 .

FIG. 3 is a block diagram of the provider computing system of FIG. 1 .

FIG. 4A is a display output of the user device during the hands-freemode of operation, according to the example embodiment.

FIG. 4B is another display output of the user device during thehands-free mode of operation, according to the example embodiment.

FIG. 5 depicts an output on the display screen of the user device ofFIGS. 1-2 , according to an example embodiment.

FIG. 6 is a flowchart of a method of populating a plurality of fields ofa form using the user device of FIG. 1 , according to an exampleembodiment.

FIG. 7 is a flowchart of a method of providing refinements to speechinput samples by the provider computing system of FIG. 1 , according toan example embodiment.

DETAILED DESCRIPTION

In the following detailed description, reference is made to theaccompanying drawings, which form a part hereof. In the drawings,similar symbols typically identify similar components, unless dictatedotherwise. The illustrative embodiments described in the detaileddescription, drawings, and claims are not meant to be limiting. Otherembodiments may be utilized, and other changes may be made, withoutdeparting from the spirit or scope of the present disclosure. It will bereadily understood that the aspects of the present disclosure, asgenerally described herein, and illustrated in the figures, can bearranged, substituted, combined, and designed in a wide variety ofdifferent configurations, all of which are explicitly contemplated andmade part of this disclosure. It should be noted that the terms “voiceinput” and “speech input” are used interchangeably throughout thedisclosure.

The present disclosure relates to systems, apparatuses, and methods offacilitating a hands-free mode of operation for a user to use a voice orspeech input to populate a plurality of fields of an electronic form.Users often fill out different types of forms in their regularday-to-day activities. For example, a user may fill out a form to open achecking account at a bank, or may fill out a form for a membership at alocal YMCA, etc. Due to the smaller size of the display screen andkeyboard on mobile devices as compared to desktops, laptops, etc.,filling out forms is often more tedious and error-prone on these typesof devices. Irrespective of whether a user uses a mobile device or someother form of a device to fill out the form, the system, methods, andapparatuses described herein relate to providing a hands-free mode ofoperation for a user to use a voice input to fill out a plurality of thefields of a form in a seamless and easy manner. Beneficially, thesystems, apparatuses, and methods provide the user with an experience ofa “conversational form” that prompts the user to populate each field ofthe form. In this regard, the user may seemingly engage in aconversation with the form to aid the filling out of the form quicklyand efficiently. As a result, users may be able to complete long formseasier and quicker. Further, a conversational form may be consistentwith the expectations of busy consumers in the modern world who mayprefer the convenience of a hands-free mode of operation, such as usinga voice input, to fill out a form during the course of their regularbusy day.

Existing technologies lack the capability to populate electronic formsin this manner. For example, Siri, Alexa, or other virtual assistantsthat enable audible commands to be implemented (e.g., changing thevolume of a speaker, making a purchase, etc.) do not enable a fillingout of an electronic form in this manner. In this regard, a user maycommand these virtual assistants to navigate a web browser to anInternet page that displays a form and then the user must revert toconventional manual entry of the fields of the form. This results in anoticeable inconvenience in the usage of these virtual assistants.

The systems, methods, and apparatuses described herein enable acceptinga voice input from a user to populate all of or mostly all of the fieldsof a form by stepping through the fields of the form, one field at atime, without the necessity of a keyboard. The various embodiments ofthe present disclosure utilize a speech synthesis Application ProgrammerInterface (API) to convert a received user voice input fromspeech-to-text (e.g., alpha, numeric, or alphanumeric text). Metadataassociated with the form is used to determine the characteristics of theform. For example, the metadata provides an indication regarding thetotal number of fields, the names of the fields, the maximum number ofcharacters allowed in a field, etc. The systems, methods, andapparatuses described herein utilize a plug-in structured to populatethe field of the form based on the characteristics of the field asdetermined from the metadata. As the user populates a first field, aprompt is provided to the user to populate a next field until all or asufficient number of fields are populated for submission. Thus, thesystems, apparatuses, and methods described herein facilitate providinga “conversational form,” whereby there is a continuous interactionbetween the system and the user based on prompting the user to enter avalue for each field of the form until the form is completely orsufficiently populated. This is analogous to a “conversation,” wherepeople may talk back and forth until the topic of discussion isfully/completely discussed.

It should be understood that not all embodiments require the user to beprompted to enter a voice input for the value of a field of a form. Forexample, in some embodiments, the systems and methods described hereinmove from a field that has been populated to a next field after apredetermined amount of time without necessarily prompting the user(i.e., an automatic movement from one field to the next based on thepassage of a predetermined amount of time). In this instance, the usermay keep track of the progress of filling out the form through a visualdisplay of the form on the display screen of the device. Further and insome embodiments, a user may choose to enter the value of a subset offields of the form through means other than voice input such that thefilling of the form may be through a combination of voice-input andmanual-input (e.g., typed input). Thus, those of ordinary skill in theart will recognize various natural and logical alternatives to thesystems, methods, and apparatuses of the present disclosure with allsuch alternatives intended to fall within the spirit and scope of thepresent disclosure.

The “form” may be any type of form that is presented electronically.Thus, the “form” may be an application, such as a housing application, acredit card application, an account application, a club membershipapplication (e.g., a gym), and so on. The form may also be a survey. Theform may further include a log-in page for various things, such as anaccount (e.g., a club account, a financial account, and the like). Theunifying characteristics are that the form is electronically displayedand includes at least one field that needs to be filled or populated.The “field” of the form refers to a box where information to bepopulated (e.g., name, date of birth, etc.). In other words, the “field”refers to a single item belonging to the form where a user input is ormay be sought. The “value” refers to the characters that actuallypopulate the field (e.g., a number or a string of alpha, numeric, oralphanumeric characters used to populate the field of a form).

Referring now to FIG. 1 , a system 100 that facilitates and enables ahands-free mode of operation of a device for receiving a voice input topopulate the fields of a form is shown, according to an exampleembodiment. The “hands-free mode of operation” refers to the ability ofa user of the device to use the device without or primarily without useof their hands/fingers. In particular and as primarily used herein, thehands-free mode of operation refers to the ability of the user topopulate the fields of a form without using a keyboard (i.e., withoutmanual entry of the field values). However, if desired, manual entry,such as via a keyboard, may be used to supplement the hands-free mode ofoperation to fill out the form. As shown, the system 100 includes anetwork 110, a user 120 associated with a user device 130, a providercomputing system 150, a speech synthesis API server 170, and a webserver 180. The user 120, the user device 130, the provider computingsystem 150, the speech synthesis API server 170, and the web server 180may be coupled to each other and therefore communication through thenetwork 110. The network 110 may include one or more of the Internet,cellular network, Wi-Fi, Wi-Max, a proprietary banking network, or anyother type of wired and/or wireless network.

The user device 130 is a computing device owned by, associated with, orotherwise used by a user 120. The user may be an individual or group ofindividuals using the user device 130. The user device 130 is structuredto provide a hands-free mode of operation for the user 120 so that theuser may provide a voice input that is converted to text (e.g.,alphanumeric text) to populate a plurality of fields of a form. In someembodiments, the user device 130 is a mobile device, which includes anytype of mobile device including, but not limited to, a phone (e.g.,smart phone, etc.), a tablet computer, a personal digital assistant,wearable devices (e.g., glasses), and the like. In other embodiments,the user device 130 is a primarily non-mobile device, such as a desktopcomputer. In some other embodiments, the user device 130 is a devicethat is only used by a user. For example, in this scenario, the devicemay be an automated teller machine (ATM) that is equipped with amicrophone, a speaker, and a display. In the example shown, the userdevice 130 is structured as a smart phone.

The user device 130 is shown to include a processing circuit 133 havingone or more processors 134 and a memory 135, a network interface circuit131, and an input/output circuit 132. The memory 135 is shown to includeor store a client application 136. In this regard, the memory 135 maycontain instructions belonging to the client application 136, which canbe executed by the one or more processors 134 of the user device 130.The network interface circuit 131 is structured to enable the userdevice 130 to exchange information over the network 110. Theinput/output circuit 132 is structured to facilitate the exchangeinformation with the user 120. An input device of or coupled to(depending on the embodiment) the input/output circuit 132 may allow theuser to provide information to the user device 130, and may include, forexample, a mechanical keyboard, a touchscreen, a microphone, a camera, afingerprint scanner, and so on. An output device of or coupled to(depending on the embodiment) the input/output circuit 132 allows theuser to receive information from the user device 130, and may include adisplay device (e.g., a display screen such as a touchscreen), aspeaker, illuminating icons, LEDs, and so on. Each of these componentsare explained more fully herein with respect to FIG. 2 .

The speech synthesis API server 170 is a computing system that iscoupled through the network 110 to the user device 130 and the othersystems/components of FIG. 1 . The speech synthesis API server 170 maybe a back-end server or computing system comprising one or moreprocessors, memory devices, network interfaces, and computing componentsas described herein that facilitate and enable various operations. Thespeech synthesis API server 170 is structured to provide a speechsynthesis API. The speech synthesis API is structured to recognize avoice input from a user 120, and to convert the voice input into text,such as alphanumeric text (and, in some embodiments, vice versa—fromtext into an audible noise). In some embodiments, the user device 130lacks built-in support for a speech synthesis API. In such embodiments,the user device 130 utilizes the speech synthesis API provided by thespeech synthesis API server 170 to convert the user's 120 voice inputinto text and vice versa. In other embodiments, the user device 130includes a speech synthesis API (may be different than that provided bythe server) that converts the voice input into text. The speechsynthesis API server 170 is also shown to include a speech recognitioncircuit 172 and a speech translation circuit 174.

The speech recognition circuit 172 is structured to recognize andconvert the user's 120 voice input into text. In operation, the user'svoice may be received via a microphone of the user device 130, whichconverts the voice into data and transmits the data to the speechsynthesis API server 170. The speech recognition circuit 172 breaks downthe user's 120 voice input (i.e., the data) into syllables. The speechrecognition circuit 172 then compares the syllables of the user's 120voice input with known syllables stored in the non-transitory memory ofthe speech recognition circuit 172 to identify a plurality of syllablesin the voice input. The speech recognition circuit 172 may then convertthe plurality of syllables into characters through, for example, alook-up table maintained in the non-transitory memory of the speechrecognition circuit 172 to complete the conversion of the user's 120voice input into text. The converted text produced by the speechrecognition circuit 172 is used to populate a relevant field of anelectronic form. In other embodiments, a different process may be usedto convert a user's voice input into alphanumeric text.

The speech synthesis API server 170 is further structured to translatetext (for example, text that is retrieved from an earlier populatedfield value of the form) into a voice output so that the value of thefield may be read aloud back to the user. In some embodiments, thespeech synthesis API server 170 may be structured to read-out the fieldvalues of the form, which enables the verification of the field valuesfor users 120 with visual impairments thereby enabling assistivetechnology support for such users 120.

The speech translation circuit 174 is structured to convert the fieldvalue retrieved from the electronic form into an audible output. Thespeech translation circuit 174 may access a look-up table in thenon-transitory memory of the speech translation circuit 174 to identifysyllables in the alphanumeric text based on the text in the field. Thespeech translation circuit 174 then sends the identified syllables tothe user device 130 to read out the determined voice output.

The web server 180 is a computing system that provides and hostswebpages/websites that are reachable by the user devices 130 via thenetwork 110. The web server 180 may be a back-end server or computingsystem comprising one or more processors, memory devices, networkinterfaces, and computing components as described herein that facilitateand enable various operations. The web server 180 is structured torespond to requests from clients such as the user device 130 to access awebpage identified by a particular Internet address. The web server 180provides the contents of the requested webpage in response to a requestfor the web page from the user device 130. The web server 180 includes aweb page response circuit 182. The web page response circuit 182retrieves from the non-transitory memory of the web server 180 relevantinformation pertaining to a particular webpage requested by the userdevice 130. In some embodiments, the relevant information includes themetadata associated with a webpage hosted by the web server 180, whichhas been requested to be downloaded by or provided to the user device130.

The provider computing system 150 is owned by, managed/operated by, orotherwise associated with a provider institution. The providerinstitution may be a financial institution that offers one or morefinancial products and services (e.g., banking and banking applicationssuch as mobile banking, lending products, payment and money transferproducts and services, etc.). Further and additionally, the providerinstitution is an entity that facilitates and enables, at least partly,operation of the hands-free input modality for a user to populate thefields of an electronic form in the system 100. As described herein andin some embodiments, the provider computing system 150 is structured tofacilitate the download of processing logic (in the form of a plug-in)to the user device 130 that enables an electronic form to be populatedvia a voice input.

As shown, the provider computing system 150 includes a processingcircuit 154 including a processor 155 and a memory 156, a networkinterface circuit 151 structured to couple the system 150 to the othercomponents of FIG. 1 through the network 110, a voice authenticationcircuit 152, and a provider enhancement circuit 156. The processor 155may be implemented as one or more application specific integratedcircuits (ASICs), field programmable gate arrays (FPGAs), a group ofprocessing components, or other suitable electronic processingcomponents. The memory 156 may be one or more devices (e.g., RAM, ROM,Flash memory, hard disk storage) for storing data and/or computer codefor completing and/or facilitating the various processes describedherein. The memory 156 may be or include non-transient volatile memory,non-volatile memory, and non-transitory computer storage media. Thememory 156 may include database components, object code components,script components, or any other type of information structure forsupporting the various activities and information structures describedherein. The memory 156 may be communicably coupled to the processor 155and include computer code or instructions for executing one or moreprocesses described herein. More details about the various components ofthe provider computing system 150 are provided during with respect toFIG. 3 .

Referring now to FIG. 2 , the details of the user device 130 are shown,according to an example embodiment. As mentioned above, the user device130 includes a network interface circuit 131 and an input/output circuit132. The network interface circuit 131 is structured to establish,enable, and maintain a connection with other components of the system100 via the network 110. In this regard, the network interface circuit131 is structured to enable the user device 130 to exchange information(e.g., data) over the network 110. The network interface circuit 131includes logic and hardware components that facilitate a connection ofthe user device 130 to the network 110. For example, the networkinterface circuit 131 may include a cellular modem, a Bluetoothtransceiver, a Wi-Fi router, a radio-frequency identification (RFID)transceiver, and/or a near-field communication (NFC) transmitter.Further, in some arrangements, the network interface circuit 131includes cryptography capabilities to establish a secure or relativelysecure communication session with certain components, such as theprovider computing system 150.

The input/output circuit 132 is structured to enable the exchange ofcommunication(s) (e.g., data, information, instructions, etc.) with auser of the user device 130. In this regard, the input/output circuit132 is structured to exchange data, communications, instructions, etc.,with an input/output component of the user device 130. Accordingly, inone embodiment, the input/output circuit 132 includes one or moreinput/output devices, such a display screen 233 (or, display), amicrophone 234, and a speaker 235. In another embodiment, theinput/output circuit 132 may include machine-readable media forfacilitating the exchange of information between the input/output deviceand the components of the input/output circuit 132. In still anotherembodiment, the input/output circuit 132 may include any combination ofhardware components (e.g., input/output components such as atouchscreen) and machine-readable media. In the example shown, theinput/output circuit 132 is machine-readable media executable by the oneor more processors 134 and, in turn, coupled to the input/output devices(e.g., display screen 233, microphone 234, and speaker 235).

The display screen 233 is structured to present visual displays (e.g.,graphical user interfaces) to a user 120. In particular, the displayscreen 233 is structured to provide and present an electronic form forthe user 120 to fill. The display screen 233 may present prompts,notifications, and confirmations to the user 120. In the example shown,the display screen 233 is structured as a touchscreen display device.

The microphone 234 is structured to receive a voice input from the user120 to fill a value of a field of the electronic form displayed by thedisplay screen 233. The microphone 234 may have any type of typicalstructure included with a user device, such as the smart phone userdevice structure.

The speaker 235 is structured to provide an audible output. The audibleoutput or noise may include a prompt, a notification, and a confirmationto the user 120 during the process of populating the fields of anelectronic form. The speaker 235 may have any type of typical structureincluded with a user device, such as the smart phone user devicestructure. In some embodiments, the speaker 235 and the microphone 234may be the same physical device/component of the user device.

Referring still to FIG. 2 , the user device 130 includes a clientapplication 136. The client application 136 is a computer program thatexecutes or runs on the user device 130. The client application 136 maybe implemented as a thin client application 239 or a native application243. A thin client application 239 is a computer program that typicallyexecutes on a networked computer with limited resources of its own(i.e., not locally on the user device). Thus, a thin client applicationfulfils or obtains its computational needs by using the resources of abackend server. In some embodiments, the server is the providercomputing system 150. In other embodiments, the server is a third-partyserver. In contrast, the native application 243 is a computer programthat uses the computation power of the device in which it resides. Forexample, as mentioned above, the user device may be an ATM. In whichcase, the native application may be hard coded into the non-transitorymemory of processor(s) of the ATM.

In some embodiments, the client application 136 is incorporated into anexisting application, such as a mobile banking application. In thisregard, the client application 136 includes an API and/or a softwaredevelopment kit (SDK) that facilitates the integration of othercomponents with the client application 136. In other embodiments, theclient application 136 is a separate application implemented on the userdevice 130. The client application 136 may be downloaded by the userdevice 130 prior to its usage, hard coded into the non-transitory memoryof the user device 130 (i.e., a native application), or be a web-basedapplication. In some implementations, the user 120 may have to log ontothe user device 130 and access the web-based interface before using theclient application 136.

As an example, the client application 136 may be a web browserapplication 241 (e.g., Google Chrome). In some embodiments, the webbrowser application 241 is structured to include a speech synthesis APIfor converting alphanumeric text to speech, and vice versa. In otherembodiments, this functionality is lacking.

The client application 136 is shown to include a digitally aware neuraldictation interface (DANDI) plug-in 237. The DANDI plug-in 237 (e.g.,add-in, add-on, extension, etc.) is a program that adds additionalfeatures to the client application 136. In one embodiment, the DANDIplug-in 237 is implemented as program code in the JavaScript programminglanguage. In other embodiments, the DANDI plug-in may be a differentstructure (e.g., constructed with a different programming language). TheDANDI plug-in 237 may be a downloadable component, which can be anadd-on to an existing application such as a web browser application(e.g., Google Chrome). Thus, the DANDI plug-in 237 may include one ormore APIs and/or SDKs that facilitate integration of the DANDI plug-in237 into the client application 137. In other embodiments, thefunctionality of the DANDI plug-in 237 described herein may be hardcodedin the non-transitory memory accessible to a processor of the device(e.g., user device). In this instance, the DANDI plug-in 237 is a nativefeature on the device. In yet other embodiments, the DANDI plug-in 237may be hard-coded into the client application such that the plug-in isnot a “plug-in”; rather, the features and functionalities describedherein are embedded as part of the client application. Thus, whiledescribed herein as a plug-in, it should be understood that thisimplementation embodiment is not meant to be limiting as the presentdisclosure contemplates various other structural implementations.

The DANDI plug-in 237 is structured to integrate with the clientapplication 136. In the embodiment shown, the DANDI plug-in 237 is adownloadable software component, which integrates with the clientapplication 136 after being downloaded. For example, the DANDI plug-in237 may be an add-on to the web browser application 241. In anotherembodiment, the DANDI plug-in 237 is hard coded into the clientapplication 136 (as opposed to being downloadable). For example, in thecase of the client application 136 being a native application 243 andwhen the device is an ATM, the DANDI plug-in 237 may be hard coded intothe non-transitory memory for execution by the processor(s) of the ATM.In this way, the DANDI plug-in 237 is not downloaded to the ATM, but isready for use upon running/using the ATM.

The DANDI plug-in 237 is structured to enable the client application 136to determine and identify the characteristics of each field in theplurality of fields of an electronic form. In this regard, the DANDIplug-in 237 is structured to perform an analysis of the metadataassociated with the electronic form. The metadata may be provided by theweb server 180 to the DANDI plug-in 237. Alternatively, the DANDIplug-in 237 may extract the metadata from the webpage hosting the form(or, from the form itself when it is not hosted by a web page, such as aPDF form). The metadata associated with an electronic form defines thecharacteristics of each field in the plurality of fields of theelectronic form. The characteristics may include, but are not limitedto, an indication regarding a total number of fields of the form, thenames of the fields, the data types of each of the fields, the maximumnumber of characters allowed in a field, the range of acceptable valuesfor a field, etc. For example, in one embodiment, the data type for a“date” field of the electronic form is in MM-DD-YYYY format, with thevalue of the MM field being a 2-digit number, and the acceptable rangeof values for the MM field being from 01 through 12. The DANDI plug-inis structured to analyze the metadata to determine variouscharacteristics of the form and, in particular, each of the fields ofthe form.

The DANDI plug-in 237 is also structured to convert the voice input foreach field of the electronic form into a value for the field. Asdescribed above and in one embodiment, the speech synthesis API functionis provided within the client application 136 (e.g., web browserapplication 241). In this regard and in some embodiments, the DANDIplug-in 237 causes execution of the speech synthesis API of the webbrowser application 241 to convert the user's 120 voice input for aspecific field into alphanumeric text. In this case, the DANDI plug-in237 is coupled to the speech synthesis API of the client application 136for converting the voice input into alphanumeric text. The DANDI plug-in237 may then cause a populating of the text into a field in theelectronic form.

In another embodiment, the speech synthesis API is not included with theclient application 136. For example, the client application 136 may be anative application 243 (e.g., a client application executing on an ATM)that lacks support for a web browser to execute the speech synthesisAPI. Rather, the speech synthesis API is provided by the speechsynthesis API server 170. In such embodiments, the DANDI plug-in isstructured to interface with and access the speech synthesis API server170 over the network 110. The DANDI plug-in may then transmit the voiceinput to the speech synthesis API server 170 over the network 110 toconvert the user's 120 voice input into text.

In still another embodiment, the DANDI plug-in 237 itself may includethe speech synthesis API. In this regard, the speech synthesis APIserver provides the speech synthesis API that is integrated into theDANDI plug-in 237. In this situation, the client application via theDANDI plug-in 237 itself is structured to receive a voice input,determine the characteristics of the voice input, and convert the voiceinput into text that is used as the value to populate the fields of theelectronic form.

The DANDI plug-in 237 is further structured to navigate through thefields of the electronic form using the characteristic of the fields ofthe electronic form. The DANDI plug-in 237 is structured to determineand identify the characteristic of fields in the form by analyzing themetadata associated with the form from the web server 180. Thus, the webserver 180 is coupled to the client application and DANDI plug-in 237.In another embodiment, the metadata analysis function may be includedwith the DANDI plug-in 237. This arrangement may be used on devices thatmay lack this feature, such as potentially certain ATMs. Irrespective,the DANDI plug-in 237 may analyze the metadata on the display that isproviding the form through a variety of techniques. For example, in oneembodiment, the DANDI plug-in parses the metadata that is implemented inone of a variety of languages like XML, HTML, etc. that describe thevarious fields of the form to determine the number of fields in theform, the range of values that are associated with each field of theform, and so on. During the metadata analysis, the DANDI plug-in 237also determines or identifies the fields that make up the electronicform, and identifies the relative positions of the fields on theelectronic form. The DANDI plug-in 237 uses the acquired knowledge ofthe relative positions of the fields to determine a priority order ofnavigating to and populating the fields of the form. In one embodiment,a determined order for populating the fields is vertical (i.e., top mostfield to bottom most field). In another embodiment, a determined orderis left to right and top to bottom in an analogous manner toleft-to-right reading. In still another embodiment, the determined orderis based on the characteristics of the fields based on the metadata. Forexample, only three of the depicted six fields may be required to bepopulated in order to enable/allow submission of the electronic form.However, the three fields are randomly dispersed on the form (e.g.,first, fourth and sixth fields when reading left-to-right andtop-to-bottom, etc.). In this situation, the determined order is thesethree forms first in a reading manner (left-to-right and top-to-bottom).At this point, a prompt may be provided to the user indicating that allthe required fields are populated and inquiring whether the user wouldlike to submit/review the form or populate the optional fields. In thismanner, navigation to the fields may be strategic rather than a roteleft-to-right or top-to-bottom manner. Such a process may save time andimprove efficiency.

Navigating through each field of the form may occur via a variety ofways. In this regard, the DANDI plug-in 237 is structured to navigate toa next field of the form via various different processes. For example,after a field is populated, the DANDI plug-in 237 using a timer functionmay automatically navigate to the next field according to the definedorder (described above) after the passage of a predetermined amount oftime (e.g., 1.5 seconds, 3 seconds, etc.). In another example, the DANDIplug-in 237 may receive a vocal command that instructs movement to thenext field (e.g., “Please move to the next field”). Based on themetadata analysis, the client application via the DANDI plug-in 237knows the information that is expected for a particular field (e.g.,date of birth) such that when a command is received, the command iseasily differentiated from the information used to populate the field.In still another example, a manual input from the user may be used tocause the movement from field-to-field (e.g., on the touchscreen, theuser may touch the field he/she wants to fill next). In yet anotherembodiment, a prompt is provided by the DANDI plug-in 237 (e.g., using aspeaker of the user device) to request information for the next field inthe form according to the determined order of populating the fields. Anexample is as follows: “[Field 1—Name] Please provide your name.” [Field2—date of birth] Thank you. Please provide your date of birth.” [Field3—address]. Please provide your address.” In this example, onceinformation is received from the user and it is populated into the form,a confirmation is provided (e.g., thank you) and the information for thenext field is asked. Contemporaneously, the field where information iscurrently sought may be highlighted on the screen. This enables twoforms of indications to the user (i.e., the audible prompt for certaininformation and the visual highlighting of the form). In thisembodiment, a conversational form is provided. Once all or the requiredfields are populated, a prompt may be provided to user inquiring him/herto submit the form and/or review their answers before submission. Theform may then be submitted (e.g., by clicking submit or via a voicecommand). In other embodiments, any combination of these examples may beused.

In some embodiments, one or more functions associated with the DANDIplug-in may be performed by the provider computing system 150. Thus, theDANDI plug-in via the user device may transmit relevant data orinformation to the system 150 for processing with specific processingcapabilities described below.

Referring now to FIG. 3 , the provider computing system 150 of FIG. 1 isshown according to an example embodiment. The provider computing system150 is shown to include a network interface circuit 151, a plug-indownload circuit 152, a voice authentication circuit 153, a processingcircuit 154, and a provider enhancement circuit 157. The processingcircuit 154 includes one or more processors 155 and a non-transitorymemory 156. The processing circuit 154 is described above.

The network interface circuit 151 (or, network interface) is structuredto enable the provider computing system 150 to establish connectionswith other components of the system 100 via the network 110. The networkinterface circuit 151 is structured to enable the provider computingsystem 150 to exchange information over the network 110 (e.g., with theuser device 130). The network interface circuit 151 includes programlogic that facilitates connection of the provider computing system 150to the network 110. The network interface circuit 151 supportscommunications between the provider computing system 150 and othersystems, such as the user device 130. For example, the network interfacecircuit 151 may include a cellular modem, a Bluetooth transceiver, aBluetooth beacon, a radio-frequency identification transceiver, and anear-field communication transmitter. Thus, the network interfacecircuit 151 may include the hardware and machine-readable mediasufficient to support communication over multiple channels of datacommunication. Further, in some arrangements, the network interfacecircuit 151 includes cryptography capabilities to establish a secure orrelatively secure communication session with the user device 130.

The plug-in download circuit 152 is structured to create, maintain, andprovide the DANDI plug-in 237 for download to the user device 130. Forexample, the user device 130 may request the download of the DANDIplug-in 237 from the plug-in download circuit 152. In some embodiments,upon receiving a request from a user device 130 for download of theDANDI plug-in, the plug-in download circuit 152 causes the DANDI plug-in237 to be downloaded to the user device 130. Thus, this embodiment isused when the DANDI plug-in 237 functionality is not hardcoded intoeither the device or the client application: i.e., when the describedfunctionality is being added to an existing application (e.g., a webbrowser).

The voice authentication circuit 153 is structured to authenticate avoice of a user received via the network interface circuit 151 from auser device 130. In some embodiments, authentication of a user 120 maybe required to use the hands-free mode of operation via the DANDIplug-in 237 based on the requirements of the form (e.g., a credit cardapplication provided by the provider institution). The voiceauthentication circuit 153 is structured to facilitateauthentication/verifying a user's voice. In some embodiments, the voiceauthentication circuit 153, upon receiving a voice input from the userdevice 130, compares the voice input with known voice samples of theuser's speech stored in the provider database 365 (described herein) fora match or a substantial match. The voice authentication circuit 153then notifies the user device 130 about the result of the match. In caseof a match, the user device 130 may skip the step of the user 120requiring the user to log in with authentication credentials since theuser 120 is recognized/authenticated through the user's 120 voice. Thus,in some embodiments, the voice authentication circuit 153 is structuredto provide the benefit of facilitating the continuation of a user's 120session without the necessity of the user being forced to provide log-inauthentication credentials during the middle of using a commercialbanking application. This feature may be advantageous for forms thatrequire sensitive information to be provided (e.g., credit cardapplications, forms that require personal identifying information,etc.). In operation, the user may be authenticated into their device andthen subsequently authenticated via their voice to use the hands-freemode of operation to fill out the form. In this regard and based on themetadata analysis, even predefined sensitive information is determinedto be required for the form, the DANDI plug-in 237 via the clientapplication may automatically transmit the user's voice to the voiceauthentication circuit 153 for an additional authentication analysis tobe performed. This adds an extra layer security that is not typical formost forms that are populated.

The provider computing system 150 further includes a providerenhancement circuit 157 that includes a speech enhancement circuit 359,a user-specific auto-complete circuit 361, a provider command dictionary363, and a provider database 365. The provider database 365 isstructured to hold, store, categorize, and/or otherwise serve as arepository for information regarding the user (e.g., the user'shistorical voice inputs). The provider database 365 is structured tostore and selectively provide access to the stored information. Theprovider database 365 may have any one of a variety of computingstructures. Although shown as being a separate component of the providercomputing system 150, in some embodiments, the provider database 365 maybe part of the memory 156.

The speech enhancement circuit 359 is structured to enhance the qualityof the input voice samples received from a user device 130 for storagein the provider database 365. In some embodiments, the enhancement inthe quality of the input voice samples may be based on the removal ofundesirable noise from the samples (e.g., the input voice samples mayinclude undesirable noise from a potentially noisy surrounding of theuser 120 due to the user 120 being in a busy marketplace, using publictransportation, etc.). The client application 136 of the user device 130may transmit samples of a user's 120 voice input to the speechenhancement circuit 359. The speech enhancement circuit 359 digitallyenhances the user's voice samples by applying filtering and digitalprocessing techniques in order to obtain better quality samples of theuser's original voice input. In some embodiments, the speech enhancementcircuit 359 is structured to provide the ability to mitigate distortionsor irregularities in the user's voice input due to the presence of anaccent in the voice, or a temporary condition (for example, a cold)affecting the user's voice, thus enhancing the quality of the voiceinput. For example, in one embodiment, the speech enhancement circuit359 extrapolates the missing or distorted syllables in the user's 120voice input based on comparing the current voice input of the user 120with past voice inputs received from the particular user 120 stored inthe provider database 365. In some embodiments, the speech enhancementcircuit 359 executes artificial intelligence based machine learningalgorithms to compare the identified syllables in the user's voice inputto a database of syllables stored in the provider database 365. Thealgorithms find the closest match for any distorted or otherwiseirregular syllables in the user's voice input in the provider database365, and cause such syllables to be replaced by the correspondingmatching syllables in the provider database 365.

In some embodiments, the speech enhancement circuit 359 is structured totranslate a user's 120 voice input from a first language to a secondlanguage (e.g., from a foreign language to the English language), suchthat the translated voice input may be used to populate the fields of aform in the second language. The speech enhancement circuit 359 is,thus, structured to provide an advantage in that the form is able to bepopulated in English even with the voice input is in a differentlanguage. Thus, providers of the electronic form need not translatetheir form(s) into various languages to accommodate the variouslanguages of the world. Rather, a translation provided by the circuit359 may occur to enable a wide usage.

The provider command dictionary 363 is structured to provide adictionary of commands recognized by the hands-free voice input system.In some embodiments, the provider command dictionary 363 receives avoice input representing a user command from the client application 136of the user device 130. For example, a user may provide to the userdevice 130 to modify the value of an earlier populated field. As anotherexample, a user 120 may issue voice commands for the initiation, andtermination of hands-free mode of operation. As still another example, auser 120 may issue a command to read-out all the fields of a formpopulated so far (the command may be issued in the middle of populatingthe form). It should be understood that the examples of commandsdescribed herein are non-limiting in nature, and the provider commanddictionary 363 is structured to support a much larger set of commandsthan the examples provided. Thus, rather than using the voice input tojust populate the electronic form, the commands are used to provideadditional functionality that may enhance the user experience.

In some embodiments, the client application 136 of the user device 130communicates with the provider command dictionary 363 through thenetwork interface circuit 151 of the provider computing system 150 toleverage the increased capability of command recognition in the providercommand dictionary 363. The provider command dictionary 363 thus expandsthe command recognition capability built into the client application 136of the user device 130 via the DANDI plug-in 237.

The user-specific auto-complete circuit 361 is structured to provideauto-complete suggestions for a particular user 120. In someembodiments, the user-specific auto-complete circuit 361 receives voiceinputs from the client application 136 of the user device 130. Theuser-specific auto-complete circuit 361 then stores the voice inputs inthe provider database 365 on a per-user basis, thus accumulating user120 provided voice inputs for multiple fields of multiple forms.Further, the user may be a customer of the provider institution. Inwhich case, the provider computing system 150 may store various otherinformation regarding the user (e.g., name, date of birth, address,ethnicity of the user, etc.). In some embodiments, the user-specificauto-complete circuit 361 utilizes the voice inputs stored in theprovider database 365 for a specific user 120 to perform a multi-fieldanalysis of user's stored voice inputs to determine auto-completesuggestions. For example, in one embodiment, the user-specificauto-complete circuit 361 executes algorithms to recognize patterns in auser's 120 voice inputs across multiple fields of multiple forms storedfor the user 120 in the provider database 365 to provide specific autocomplete suggestions that are tailored to the particular user 120. Thismay speed up a filling of the form.

In operation, a user 120 interacts with the user device 130 to initiatethe process of a hands-free mode of operation for populating a formusing the user device 130. The form may be an application (e.g., creditcard application, account application, gym membership application,etc.), a survey, and any other form that is provided electronically. Insome implementations, the user provides an authentication credential toaccess the user device 130 (e.g., a biometric, a passcode, etc.). In oneembodiment, the user 120 clicks on a portion of a form or a part of thedisplay screen 233 of the user device 130 to initiate the hands-freemode of operation. For example, a manual input—e.g., the clicking on aDANDI icon—is implemented to initiate the hands free mode of operation.In another example, the user 120 may provide a specific voice command(e.g., initiate “DANDI”). In either situation, the client application136 running on the user device 130 recognizes the command to initiatethe hands-free mode of voice input to subsequently enable the populatingof the fields using a voice input. Upon enabling the hands-free mode ofoperation, the user device 130 may be structured to provide theindication to the user 120 that the hands-free input modality is active.For example, in one embodiment, the user device 130 is structured toprovide an animation on the display screen 233 of the user device 130regarding the indication (e.g., an illuminating icon, a graphic, etc.).

As described above, the user device 130 may prompt the user 120 throughthe microphone 234 to provide a voice input for the value of a field ofthe form that the user wants to populate once the hands-free mode ofoperation is enabled. As alluded to before, in one embodiment, the DANDIplug-in 237 is structured to navigate to various fields of an electronicform, and populate the values in the fields of the form. As describedabove, once all or a sufficient number of the fields are populated, theform may be submitted (e.g., the application submitted, the PDFdetermined to be complete and then saved, access to an account provided,etc.).

While certain functions are described above separately with respect tothe DANDI plug-in 237, in some embodiments, various functions may beincluded with the DANDI plug-in 237. For example and as described above,the DANDI plug-in may include the speech synthesis API. In this regardand when running, the DANDI plug-in 237 is structured to receive a voiceinput and then convert the voice input to alphanumeric text. In otherembodiments and when the client application includes a similar function,the speech synthesis API of the DANDI plug-in may be disabled to reducethe processing requirements of the DANDI plug-in. In this scenario, thevoice-to-text conversion may be done by the speech synthesis API of theclient application and then provided to the DANDI plug-in 237.Accordingly, in one embodiment, the DANDI plug-in 237 may include thespeech synthesis API and the metadata analysis function. As a result,the DANDI plug-in 237 may itself be structured to convert a voice inputto alphanumeric text, navigate between the fields of the form, andultimately enable the user to populate the form. As still anotherexample, certain of the functions, such as speech enhancement, voiceauthentication, and translation described above of the providercomputing system 150 may also be included with the DANDI plug-in 237. Insome embodiments, by keeping these functions accessible via the plug-inand not a part of the plug-in 237, the local processing requirements forrunning the plug-in 237 may be reduced to improve the processing speed.

In the example shown, the DANDI plug-in 237 includes the speechsynthesis API and metadata analysis feature. Further, the voiceauthentication, translation, and enhancement features are provided bythe provider computing system to alleviate the size of the plug-in 237.That said, when the plug-in 237 is used with an application that alreadyhas a built in speech to text conversion feature, the plug-in 237 mayuse the output of that feature to reduce duplicative features.Alternatively, the plug-in 237 may use the output of that feature forcomparison purposes to the determined text by the plug-in 237. This maybe used to help the plug-in 237 “learn” and become more refined overtime.

In one specific example, the provider computing system 150 may includeartificial intelligence or deep learning capabilities structured tooptimize operation of the plug-in 237 over time (hence, digitally awareneural dictation interface). For example, processing circuit 154 mayinclude a convolutional neural network associated with one or more ofthe circuits, such as the speech enhancement circuit 359. In operation,the circuit 359 receives multiple samples of the user's voices (inputs).Convolution layers and programming are used by the circuit 359 toidentify the syllables in the user's voice, patterns of speech, andother characteristics of the user's voice. This may include referencingother users' voice samples. This node processing results in a pluralityof layers. Using a learning a processing (e.g., back-propagating), thecircuit 359 begins to readily determine and identify the featuresassociated with the user's voice as falling within defined categories orclasses (e.g., typically used words such as a “the” and “next” may forma class, nouns may form a class, and other ways to group voice inputsmay form additional classes). As more learning is performed, the circuit359 may more quickly determine a user's voice input to be a certainletter, word, or phrase. This may result in the circuit 359 developing alist that correlates the user's voice samples to these known outputs. Assuch and then in operation, these letters, words, or phrases may be morequickly determined by the plug-in 237 locally moving forward whichenhances operation of the plug-in. In other embodiments, differentneural network, machine-learning, or artificial intelligence processesmay be used.

Referring now to FIG. 4A, a display output 400 on the display screen 233of a user device during a hands-free mode of operation for populating aform is shown, according to an example embodiment. In the embodiment ofFIG. 4A, an animation 402 is displayed on the display screen 233, whichis an indication to the user that the at least partial hands-free modeof operation of the user device is enabled. Thus, the animation 402provides a clear indication on the display screen 233 to the user 120that the system is ready for a voice input to populate individual fieldsof the form. The voice input provides a value of a field of the form.The voice input may also be a voice command to the user device toperform a specific operation. In the example of FIG. 4A, the user 120issues a command to the client application 136 to modify the value of anearlier populated field pointed to by reference numeral 404. In someimplementations, upon receiving an input value for the field to bemodified, the client application 136 is structured to revert back to thenext field where it was last awaiting a user voice input to populate thevalue of the field, which is pointed to by reference numeral 406. Thepopulated values of the fields of the form are available for visualverification by the user 120 on the display screen 233 of the userdevice 130.

Referring now to FIG. 4B, another display output 450 on the displayscreen 233 of the user device 130 during a hands-free mode of operationfor filing out a form is shown, according to an example embodiment. Inthe embodiment of FIG. 4B, there is no equivalent of the animation 402of FIG. 4A that is displayed because the client application can only bein a listening mode waiting for user's 120 voice input, or in theprompting mode (prompting the user for the value of the next field ofthe form, for example). The absence of the animation is an indication tothe user 120 that the at least partial hands-free mode of operation ofthe user device 130 by the user 120 is currently disabled. In someembodiments, a pop-up display 460 is provided on the display screen 233of the user device 130 (or the user 120 is prompted through themicrophone (not shown) of the user device 130). The previous and newlypopulated values 455 are available for visual verification by the user120 on the display screen 233 of the user device 130.

Referring now to FIG. 5 , an embodiment 500 of a display output 502 ofdata visualization through voice input is shown, according to an exampleembodiment. FIG. 5 depicts a bar graph 504 as an example datavisualization surfaced through a user's 120 voice input. In the exampleembodiment, a display output 502 on the display screen 233 of the userdevice 130 is shown based on the user's 120 selection of an option froma drop-down menu to display the underlying data as a bar graph (asopposed to other possible menu options, such as, a pie chart, or ascatter diagram, or time series graph, etc.), during a hands-free modeof operation. In the example embodiment of FIG. 5 , the user's 120 voiceinput is used for purposes other than to provide voice input for thevalue of a field of a form, or to provide voice input for a recognizedcommand. In the example embodiment, the user's voice input is used toselect an option from a drop-down menu displayed on the display screen233 of the user device 130, in order to visualize the associated data.In operation, the client application 136 executing on the user device130 is structured to process different types of metadata and processinglogic during the hands-free operation of the user device 130, to provideuser experiences which go beyond just the populating the plurality offields of a form by using voice input.

It should be understood that providing voice input by a user 120, orsurfacing data visualization through voice input (as explained above inthe discussion of FIG. 5 ) are non-limiting examples of hands-freeoperation of a user device 130 by a user 120 via the DANDI plug-in 237.In some embodiments, other example uses of hands-free operation areenvisaged by, and fall under the scope of the present disclosure. In anexample embodiment, hands-free operation is applicable to a VirtualReality (VR) system that may include at least a headset or a visor, anda microphone. Conventionally, a user 120 engaged in a VR experiencewould have to take the visor off (i.e., get out of the VR experience)and would have to provide inputs for form-filling using a keyboard (orequivalent), in case the user 120 has to fill out a form while beingengaged in the VR experience. But the hands-free mode of operationenabled by the present disclosure may be used for form-filling when theVR headset either supports a client application capable of downloadingthe DANDI plug-in or it is hard-coded into the computer therebyalleviate the user to have to disengage from the VR experience (e.g.,without having to take the VR headset or visor off).

Referring now to FIG. 6 , a flowchart depicting a method 600 ofpopulating a plurality of fields of a form and providing aconversational electronic form using the user device of FIG. 1 is shown,according to an example embodiment. Because the method 600 may beimplemented with the components of FIG. 1 , reference may be made tovarious components of the system 100 to aid explanation of the method600.

At process 602, an electronic form-filling voice function is provided.In one embodiment, a plug-in and, in particular, the DANDI plug-in 237is provided by the provider computing system 150. In this regard,process 602 is described as providing the DANDI plug-in 237 whichprovides the electronic form-filling voice functionality. In anotherembodiment, the functionality of the DANDI plug-in is already includedwith an application, such as a web browser application 241. The DANDIplug-in 237 includes or utilizes a speech synthesis API that converts auser's voice input into alphanumeric text. The DANDI plug-in 237includes a metadata analysis feature whereby metadata associated with anelectronic form is received by the DANDI plug-in 237, and then analyzedto determine the characteristics of the fields of the form. For example,the DANDI plug-in 237 is structured to determine, through metadataanalysis, the characteristics about the total number of fields, thenames of the fields, the data types of each of the fields, the maximumnumber of characters allowed in a field, the range of acceptable valuesfor a field, etc. of the form.

At process 604, a partial hands-free operation of the user device 130 isenabled. For example, an electronic form may be displayed by the userdevice 130. The user 120 may then click on a portion of a webpage of theuser device 130, or the user 120 may issue a specific voice command orrequest that may be recognized by the client application 136 as theinitiation of the hands-free mode of operation of the user device 130 bythe user 120 in order to populate the fields of the form. The clientapplication 136 may process the voice command to initiate the hands-freemode of operation, or it may pass on the command to the providercomputing system 150 through the network interface circuit 131. In thelatter embodiment, the provider enhancement circuit 156 in the providercomputing system 150 may interpret the voice command to initiate the atleast partial hands-free mode of operation by the user 120 of the userdevice 130, and inform the client application 136 of the initiation ofthe at least partial hands-free mode of operation where “partial” meansthat the user is still able to provide manual inputs if desired.

At process 606, a speech or voice input is received. For example, theuser 120 may provide a speech input regarding a field. The microphone234 of the user device 130 may receive the speech input from the user120, which corresponds to a value of a current field of a plurality offields of the electronic form. The input is transmitted to the DANDIplug-in 237 and client application. The client application, via themicrophone 234, may prompt the user for information associated with aparticular field in order to provide a conversational form (e.g., themetadata analysis may determine what information is required and theclient application may audibly request the user to provide this specificinformation via the microphone).

At process 608, the speech input is converted into text (e.g., alpha,numeric, or alphanumeric text). The client application 136, via theDANDI plug-in 237, converts the speech input received at process 606from speech into text (e.g., alphanumeric text). In another embodiment,the client application 136 accesses a speech synthesis API residing on aspeech synthesis API server 170 to convert the user speech input toalphanumeric text. In other embodiments, the client application 136 mayinclude in-built support for a speech synthesis API that facilitates thesynthesis of speech, i.e., conversion from speech-to-alphanumeric, andfrom alphanumeric-to-speech.

At process 610, a field of the form is populated by the valuecorresponding to the text. The value refers to the characters (e.g.,alphanumeric text) that is placed in the field based on the conversionof the speech input to alphanumeric text. In one embodiment, the clientapplication 136, via the DANDI plug-in 237, first determines which fieldof the form is to be populated by analyzing the metadata describing theplurality of fields of the form. Then the client application 136, viathe DANDI plug-in 237, populates the appropriate field of the form withthe converted alphanumeric text. Finally, the client application 136 mayprovide a display on the user device 130 to permit a visual verificationby the user 120 that the field value was populated correctly. Forexample, the value may be provided on a display screen 233 of the userdevice 130. This permits visual verification by the user 120 that thevalue has been entered into the correct field of the form, and that thevalue corresponds to the speech input provided by the user 120.

At process 612, a navigation to a next field in the form is done. Inthis regard, the client application, via the DANDI-plug-in 237, maydetermine whether there are additional fields in the form based on themetadata and whether various fields are populated or compete. The clientapplication via the plug-in 237 determines the priority order of thefields (i.e., the fields that should be filled first, such as the fieldsthat require population before submission is allowed). At this point,the client application via the plug-in 237 determines that additionalfields need to be populated and navigates to those fields for promptingthe user to fill according to the determined order. As described above,determining when to move or navigating from field to field may be donevia a variety of different ways. For example, after the passage ofpredetermined amount of time, the field may be determined to bepopulated and a prompt for information for the next field provided viathe speaker to the user according to the determined order of fields. Asanother example, an affirmative input/confirmation such as a click orvocal command from the user is received that indicates the field iscomplete (a manual or verbal confirmation). As still another example, aprompt from instructions from the plug-in to a speaker of the userdevice may ask the user for information specific to the next field isprovided. With regard to the last example of providing an audible promptfor the required information for the next field, this situationfacilitates a conversational form whereby the plug-in navigates fromfield-to-field conversationally with the user until all or a sufficientamount of fields are populated. Throughout this navigation, theconverted speech-to-text may be displayed in each field for visualverification by the user.

In some embodiments, a user may notice one or more field values that theuser desires to change or modify. Accordingly, the microphone mayreceive a voice command from the user to modify a previously populatedfield value, and subsequently receive a speech input from the user tooverride the value of the previously populated field.

At process 614, a completeness of the form is determined. The clientapplication 136, via the DANDI plug-in 237, analyzes the metadatadescribing the plurality of fields of the form. Based on the analysis,the client application 136 determines whether a sufficient number offields have been populated (a “completeness”). The “sufficient number offields” may be all the fields or a predefined subset of the fields basedon the metadata. For example, some fields may be optional that are notnecessary to be filled in order for the form to be determined to be“complete.” In this regard, a certain subset of fields may be requiredto be populated before the form is allowed to be submitted. If yes, thenthe client application 136 proceeds to process 616. But if thedetermination is that there are more fields to populate, then the clientapplication 136, via the DANDI plug-in 237, reverts back to process 612,to prompt the user 120 for the value of the next field to be populated.

At process 616, upon the form being determined to be complete, a promptis provided to review the populated fields and/or to submit the form.For example, an audible prompt via the speaker from the DANDI plug-inand client application may be provided: “The form is complete. Would youlike to submit the form?” The client application 136 provides a displayon the user device 130 to prompt the user to indicate whether the user120 wants to review the fields of the form, or whether the user 120wants to submit the form.

At process 618, an indication to submit the form is received. Thus,submission of the form is enabled. The indication to submit the form maybe provided vocally (e.g., as a vocal command as described above). Theindication may be provided manually (e.g., clicking on a submit buttonon the form, clicking save on a PDF form, etc.). Upon the user 120providing an indication to submit the form, the client application 136proceeds to process 620. On the other hand, if the user 120 indicates atprocess 616 that the user 120 wants to review the populated values ofthe plurality of fields of the form, then the client application 136,via the DANDI plug-in 237, reverts back to process 612, and prompts theuser 120 to re-enter or accept the existing values in each of theplurality of fields of the form.

At process 620, a termination or disablement of the hands-free operationof the user device 130 by the user 120 is accomplished. In oneembodiment, this step is performed by the client application 136 of theuser device 130. Further, this step may be performed automatically uponsubmission of the form. Or, an explicit input from the user may beprovided (e.g., a vocal command or a manual entry), which disables thehands-free or at least partial hands-free mode of operation. As yetanother embodiment, the functionality provided by the DANDI plug-in mayalways be on. In this regard, one need not enable or disable thehands-free mode of operation. Rather, a user may simply click on a DANDIicon to initiate use of the functionality of the DANDI plug-in with aform. Alternatively or additionally, a user may navigate to a web pagethat hosts a form and the functionality described herein with respect tothe DANDI plug-in may be automatically initiated or semi-automaticallyinitiated (e.g., “Please confirm you would like to use DANDI” may beprovided as a prompt to the user upon reaching the form on the webpage).

Method 600 provides the technical advantage of being able to navigateand populate the plurality of fields of an electronic form in a handsfree manner by interacting using a user's voice. The user's voice isused both to receive commands, and also for obtaining the values of thefields of the form. In some embodiments, functionality implemented onthe user device (e.g., analysis of the metadata, providing a speechsynthesis API, etc.) may be offloaded to backend servers. This providesthe technical advantage of less computational load on the processor(s)of the user device. Method 600 also provides several user benefits, suchas the ability to populate entire forms in a hands free manner bycarrying out a conversation with the user device. In this regard, thespeaker of the user device based on instructions from the plug-in mayprompt the user for a value of a next field of the plurality of fieldswithout an affirmative input that the current field is populated inorder to provide a conversational electronic form. Thus and like aconversation, there is a free flow of movement from one field (oneconversation topic) to another field (another conversation topic). Thisreduces the friction typically experienced when filling out electronicforms. The processes of method 600 recognize the difference between usercommands and user speech input, which makes populating the fields of theform easy for the user even when the user makes some mistakes bynavigating back to previously populated fields through voice commands.

Method 600 also provides the benefit of error checking in at least twoways. First, the client application 136, via the DANDI plug-in 237,provides a visual depiction of the filled field as the field is filled.Second, the client application 136, via the DANDI plug-in 237, providesa visual depiction of the completed form prior to submission of theform. In some embodiments, the client application 136, via the DANDIplug-in 237, may read aloud the filled field as the field is populated.In operation, the client application 136 via the DANDI plug-in 237 mayidentify an error with respect to a provided field value (alternatively,as described below, the provider computing system 150 may determine oneor more errors). For example, the DANDI plug-in 237 may expect certainvalues for each field based on the metadata analysis. If the providedinformation does not match the expected values (e.g., expecting a phonenumber and a home address value is provided), the client application 136via the DANDI plug-in 237 may prompt the user to confirm theirsubmission or request different information. Accordingly and duringpopulation of the fields, errors may be determined by the DANDI plug-in237 with respect to one or more field values. The client application viathe DANDI plug-in may compare the received speech input (or convertedtext) to the expected value for the field(s) of the form and where theexpected values do not match the speech input (or converted text), anerror is flagged. The DANDI plug-in 237 may then prompt the user toconfirm the field values or to change the field values with determinederrors. In some embodiments and prior to submission of the form, theclient application 136, via the DANDI plug-in 237, may read aloud eachfield and each field value. In either situation, a user may receive anauditory and visual indication of the populated field and/or fields ofthe form. Accordingly, an ability to check for errors is provided by theDANDI plug-in 237 to the user.

In some embodiments and in addition to the error checking provided, formvalidation and error messaging may be provided. In this way, the clientapplication 136, via the DANDI plug-in 237 may provide a message to theuser to confirm the value of one or more fields of the form. Forexample, the field may be date-of-birth yet the value depicts the nameof the user. The DANDI plug-in 237 may compare the expected field valueto the actual field value to determine that a potential error exists.For example, the expected field values are numerical in nature yet thereceived converted text is alpha characters. As a result, the clientapplication 136 via the DANDI plug-in 237 may provide a message, such asan audible question, to the user: “The date-of-birth field includes aname and not a date. Would you like to return to this field to changethe value?” In another embodiment, the error checking may includeautomatic correction. With respect to the previous example, the clientapplication via the DANDI plug-in 237 may recognize that the valueprovided by the user is their name and the field is date-of-birth.Rather than populating the date-of-birth field, the client applicant viathe DANDI plug-in 237, using the metadata analysis of the form, locatesand populates the name field with the user's name. Then, the clientapplication via the DANDI plug-in audibly prompts the user for theirdate-of-birth for the date-of-birth field. In this way, a smartform-filling aspect is provided via the DANDI plug-in 237. As avariation to this aspect and using information from the auto-completecircuit 361, may proactively prevent errors as they may occur. Stickingwith the above example and knowing that the field is date-of-birth,despite the user providing an audible input of their name, the visuallydepicted field is their date-of-birth. In this regard, the circuit 361provides information regarding the user's date of birth. As the DANDIplug-in 237 compares the user's input (their name) to the field value(date-of-birth), the client application via the DANDI plug-in maydisregard the user's voice input in favor of the date-of-birthinformation from the circuit 361 because this information matches therequired field value. Thus, a proactive error correction feature may beprovided.

As mentioned above, a form validation feature may also be provided. Thisvalidation provides a holistic error examination process. In oneembodiment and after the form is submitted, the form is sent to theprovider computer system 150 rather than the end recipient computingsystem. Using stored submitted forms associated with the user, thesystem 150 compares the field values to previously-submitted fieldvalues. The system 150 may then identify potential errors and either fixthem before submitting the form or transmit a message back to the userfor potential correction (e.g., a verbal prompt, a written message suchas a push notification, etc.). If the system 150 determines that theform appears to be correctly filled (e.g., by matching the field valueswith the required information for each field to ensure a match), thenthe system 150 transmits the form to the end recipient. As anotherexample, an audible prompt may be provided to the user to check the formprior submission via the client application. If the user responds in theaffirmative, then the form-to-be-submitted is transmitted to the system150 for validation. This may be beneficial for long and complex forms,such as mortgage forms where additional analysis is desired to ensure noor likely no errors. In another embodiment, a carbon copy of the formand the populated fields is provided to the system by the clientapplication during population of the form. This may enable simultaneouserror-checking of the form by the system 150. These validationprocedures show the potential involvement of the system 150 inattempting to mitigate errors in filling of the form. This may be usedwhen such functionality is not included with the DANDI plug-in 237.

As mentioned herein, artificial intelligence, machine learning, and thelike may be used by the processing circuit of the provider computingsystem 150. The use of artificial intelligence, such as theabove-described convolutional neural networks, may also be used in theerror-checking process of the electronic form prior to submission. Asdescribed herein, a learning by the system 150 of the user's typicalresponses (e.g., home address, favorite pet, etc.) and voicecharacteristics via artificial intelligence may enable a quicker fillingof forms and with less likelihood for errors.

FIG. 7 is a flowchart depicting a method 700 of providing refinements tospeech input samples by the provider computing system 150, according toan example embodiment. In some embodiments, the speech input receivedfrom a user 120 may be distorted, garbled, attenuated, or irregular insome manner (for example, there are unexpected gaps in the speechinput). In some embodiments, the irregularity in the speech input is dueto an accent in the speech, or due to the speech input being in aforeign language.

At process 702, a speech input is received. In particular, a speechinput for filling out a field of a plurality of fields of a form is areceived. The client application 136, via the DANDI plug-in 237,processes the speech input by passing the received speech input to thespeech synthesis API.

At process 704, an irregularity in the speech input is determined. Inone embodiment, the client application 136, via the DANDI plug-in 237,fails to recognize the syllables in the speech input after processingthe speech input through the speech synthesis API. Due to the failure inrecognizing the syllables in the received speech input, the clientapplication 136 classifies the speech input as irregular. The clientapplication 136 then forwards the speech input to the provider computingsystem 150 for refinement of the quality of the speech input. Some ofthe reasons for the speech input to be irregular may be due toattenuation of the speech input, or due to the presence of backgroundnoise, or due to an accent that is hard to recognize. As an example, theprovider computing system 150 may determine that the irregularity is anon-English language speech input. This may be identified by the clientapplication via the DANDI plug-in in that the speech input is notrecognized, which causes the client application to transmit the speechinput to the provider computing system 150. The processing circuit ofthe system 150 may then determine that speech input is a non-Englishlanguage speech input (e.g., via the speech enhancement circuit 359).Then, the provider computing system 150 may translating (e.g., via thespeech enhancement circuit 359) the non-English language speech inputinto the English language as part of the refinement. Because of thestorage capacity of the system 150, a translation may be readilyaccomplished with minimal time by the system 150 versus the clientapplication and DANDI plug-in 237.

At process 706, the speech input is refined. In one embodiment, thespeech enhancement circuit 359 of the provider computing system 150processes the received speech input with artificial intelligence (AI)smart algorithms to refine the speech input samples. In someembodiments, the AI smart algorithms look up the historical voice inputsfor a user 120 in the provider database 365 to identify a pattern in theuser's 120 speech input, and then use extrapolation to refine thecurrent speech input samples received in irregular form. In anotherembodiment, the speech enhancement circuit 359 digitally enhances thespeech input samples through filtering and digital processing techniquesto obtain better quality samples of the user's 120 speech input toimprove the reliability of the recognition of the speech input. In stillanother embodiment, the speech enhancement circuit 359 of the providercomputing system 150 leverages stored information for the specific user120 in a provider database 365 by analyzing patterns in the user's pastspeech inputs. Based on the patterns, various refinements to the speechinput may be performed. For example, based on known pronunciations ofthe user's home address, this information may be used to determine thata speech input is regarding the user's home address.

At process 708, a refined speech output based on the refined speechinput is provided. In particular, the speech enhancement circuit 359 ofthe provider computing system 150 is structured to provide the refinedspeech output back to the client application 136 of the user device 130.Then, the processing circuit 154 may convert the refined speech inputinto text (e.g., alpha, numeric, alphanumeric text). The system 150 thenprovides the text converted from the refined speed input to the userdevice. The client application 136 uses the text from the refined speechoutput to populating the value of a field in order to fill an electronicform in accord with method 600 and the other disclosure containedherein.

The arrangements described herein have been described with reference todrawings. The drawings illustrate certain details of specificarrangements that implement the systems, methods and programs describedherein. However, describing the arrangements with drawings should not beconstrued as imposing on the disclosure any limitations that may bepresent in the drawings.

It should be understood that no claim element herein is to be construedunder the provisions of 35 U.S.C. § 112(f), unless the element isexpressly recited using the phrase “means for.”

As used herein, the term “circuit” may include hardware structured toexecute the functions described herein. In some arrangements, eachrespective “circuit” may include machine-readable media for configuringthe hardware to execute the functions described herein. The circuit maybe embodied as one or more circuitry components including, but notlimited to, processing circuitry, network interfaces, peripheraldevices, input devices, output devices, sensors, etc. In somearrangements, a circuit may take the form of one or more analogcircuits, electronic circuits (e.g., integrated circuits (IC), discretecircuits, system on a chip (SOCs) circuits, etc.), telecommunicationcircuits, hybrid circuits, and any other type of “circuit.” In thisregard, the “circuit” may include any type of component foraccomplishing or facilitating achievement of the operations describedherein. For example, a circuit as described herein may include one ormore transistors, logic gates (e.g., NAND, AND, NOR, OR, XOR, NOT, XNOR,etc.), resistors, multiplexers, registers, capacitors, inductors,diodes, wiring, and so on).

The “circuit” may also include one or more processors communicativelycoupled to one or more memory or memory devices. In this regard, the oneor more processors may execute instructions stored in the memory or mayexecute instructions otherwise accessible to the one or more processors.In some arrangements, the one or more processors may be embodied invarious ways. The one or more processors may be constructed in a mannersufficient to perform at least the operations described herein. In somearrangements, the one or more processors may be shared by multiplecircuits (e.g., circuit A and circuit B may comprise or otherwise sharethe same processor which, in some example arrangements, may executeinstructions stored, or otherwise accessed, via different areas ofmemory). Alternatively, or additionally, the one or more processors maybe structured to perform or otherwise execute certain operationsindependent of one or more co-processors. In other example arrangements,two or more processors may be coupled via a bus to enable independent,parallel, pipelined, or multi-threaded instruction execution. Eachprocessor may be implemented as one or more general-purpose processors,application specific integrated circuits (ASICs), field programmablegate arrays (FPGAs), digital signal processors (DSPs), or other suitableelectronic data processing components structured to execute instructionsprovided by memory. The one or more processors may take the form of asingle core processor, multi-core processor (e.g., a dual coreprocessor, triple core processor, quad core processor, etc.),microprocessor, etc. In some arrangements, the one or more processorsmay be external to the apparatus, for example the one or more processorsmay be a remote processor (e.g., a cloud based processor).Alternatively, or additionally, the one or more processors may beinternal and/or local to the apparatus. In this regard, a given circuitor components thereof may be disposed locally (e.g., as part of a localserver, a local computing system, etc.) or remotely (e.g., as part of aremote server such as a cloud based server). To that end, a “circuit” asdescribed herein may include components that are distributed across oneor more locations.

An exemplary system for implementing the overall system or portions ofthe arrangements might include a general purpose computing computers inthe form of computers, including a processing unit, a system memory, anda system bus that couples various system components including the systemmemory to the processing unit. Each memory device may includenon-transient volatile storage media, non-volatile storage media,non-transitory storage media (e.g., one or more volatile and/ornon-volatile memories), etc. In some arrangements, the non-volatilemedia may take the form of ROM, flash memory (e.g., flash memory such asNAND, 3D NAND, NOR, 3D NOR, etc.), EEPROM, MRAM, magnetic storage, harddiscs, optical discs, etc. In other arrangements, the volatile storagemedia may take the form of RAM, TRAM, ZRAM, etc. Combinations of theabove are also included within the scope of machine-readable media. Inthis regard, machine-executable instructions comprise, for example,instructions and data which cause a general purpose computer, specialpurpose computer, or special purpose processing machines to perform acertain function or group of functions. Each respective memory devicemay be operable to maintain or otherwise store information relating tothe operations performed by one or more associated circuits, includingprocessor instructions and related data (e.g., database components,object code components, script components, etc.), in accordance with theexample arrangements described herein.

It should be noted that although the diagrams herein may show a specificorder and composition of method steps, it is understood that the orderof these steps may differ from what is depicted. For example, two ormore steps may be performed concurrently or with partial concurrence.Also, some method steps that are performed as discrete steps may becombined, steps being performed as a combined step may be separated intodiscrete steps, the sequence of certain processes may be reversed orotherwise varied, and the nature or number of discrete processes may bealtered or varied. The order or sequence of any element or apparatus maybe varied or substituted according to alternative arrangements.Accordingly, all such modifications are intended to be included withinthe scope of the present disclosure as defined in the appended claims.Such variations will depend on the machine-readable media and hardwaresystems chosen and on designer choice. It is understood that all suchvariations are within the scope of the disclosure. Likewise, softwareand web implementations of the present disclosure could be accomplishedwith standard programming techniques with rule based logic and otherlogic to accomplish the various database searching steps, correlationsteps, comparison steps and decision steps.

The foregoing description of arrangements has been presented forpurposes of illustration and description. It is not intended to beexhaustive or to limit the disclosure to the precise form disclosed, andmodifications and variations are possible in light of the aboveteachings or may be acquired from this disclosure. The arrangements werechosen and described in order to explain the principals of thedisclosure and its practical application to enable one skilled in theart to utilize the various arrangements and with various modificationsas are suited to the particular use contemplated. Other substitutions,modifications, changes and omissions may be made in the design,operating conditions and arrangement of the arrangements withoutdeparting from the scope of the present disclosure as expressed in theappended claims.

What is claimed is:
 1. A method, comprising: determining, by one or moreprocessing circuits, a plurality of elements of a document; receiving,by the one or more processing circuits, a first speech input from a userto enable a mode of operation; authenticating, by the one or moreprocessing circuits, the user by comparing the first speech input fromthe user with at least one voice sample of the user; in response toauthenticating the first speech input, enabling, by the one or moreprocessing circuits, the mode of operation; receiving, by the one ormore processing circuits in the mode of operation, a second speech inputfor filling out a first element of the document, wherein the firstelement is selected based on a priority order; determining, by the oneor more processing circuits, an irregularity or distortion in the secondspeech input based on the first element and identifying a missingsyllable or a distorted syllable in the second speech input, whereinidentifying the missing syllable or the distorted syllable comprisesexecuting an analysis of the second speech input, wherein either (1) themissing syllable is determined based on other syllables identified inthe analysis, or (2) the distorted syllable is determined based onfailing to recognize a syllable in the analysis; refining, by the one ormore processing circuits, the second speech input into at least onematching syllable by extrapolating the missing syllable or the distortedsyllable based on stored syllables of a plurality of speech inputs,wherein the at least one matching syllable is determined at least inpart on an expected element value associated with the first element;converting, by the one or more processing circuits, the refined secondspeech input comprising the at least one matching syllable into text;and providing, by the one or more processing circuits, the text to auser device to populate the first element with the text.
 2. The methodof claim 1, wherein the irregularity is a first irregularity and themethod further comprises: determining, by the one or more processingcircuits, a second irregularity that is a non-English language speechinput; identifying, by the one or more processing circuits, a languageof the non-English language speech input of the second irregularity; andtranslating, by the one or more processing circuits, the non-Englishlanguage speech input into English language.
 3. The method of claim 1,wherein determining the irregularity in the second speech input is basedon identifying the distorted syllable in the second speech input, andthe method further comprises: determining, by the one or more processingcircuits, that the distorted syllable is due to at least one of anattenuation of the second speech input, a presence of background noise,or an accent in the second speech input.
 4. The method of claim 3,further comprising: transmitting, by the one or more processingcircuits, the second speech input to a speech enhancement circuit to atleast partially mitigate the irregularity in the second speech input. 5.The method of claim 4, further comprising: receiving, by the one or moreprocessing circuits, a mitigated speech output from the speechenhancement circuit as a refinement to at least partially mitigate theirregularity in the second speech input.
 6. The method of claim 1,further comprising: highlighting, by the one or more processing circuitson a display screen of the user device, a second element of theplurality of elements in response to determining that the first elementis populated with the text.
 7. The method of claim 1, furthercomprising: determining, by the one or more processing circuits, theexpected element value for the first element based on metadata;comparing, by the one or more processing circuits, the second speechinput to the expected element value; and determining, by the one or moreprocessing circuits, that the second speech input does not match theexpected element value of the first element.
 8. The method of claim 1,further comprising: correcting, by the one or more processing circuits,an error in the first element by disregarding the received second speechinput for a second value of the first element in favor of informationthat matches the expected element value of the first element.
 9. Themethod of claim 1, further comprising: filtering, by the one or moreprocessing circuits through at least one digital processing technique,the second speech input to remove at least a portion of theirregularity.
 10. The method of claim 1, wherein the refinement of thesecond speech input comprises: executing, by the one or more processingcircuits, at least one artificial intelligence algorithm to compare eachsyllable in the second speech input to the stored syllables in adatabase to find a closest match for the missing syllable or thedistorted syllable in the second speech input; and providing, by the oneor more processing circuits, at least one user specific auto-completesuggestion based on information stored in the database associated withthe user, wherein the information represents stored values correspondingto multiple elements of documents previously filled by the user.
 11. Asystem, comprising: one or more processing circuits configured to:determine a plurality of elements of a document; receive a first speechinput from a user to enable a mode of operation; authenticate the userby comparing the first speech input from the user with at least onevoice sample of the user; in response to authenticating the first speechinput, enable the mode of operation; receive, in the mode of operation,a second speech input for filling out a first element of the document,wherein the first element is selected based on a priority order;determine an irregularity or distortion in the second speech input basedon the first element and identifying a missing syllable or a distortedsyllable in the second speech input, wherein identifying the missingsyllable or the distorted syllable comprises executing an analysis ofthe second speech input, wherein either (1) the missing syllable isdetermined based on other syllables identified in the analysis, or (2)the distorted syllable is determined based on failing to recognize asyllable in the analysis; refine the second speech input into at leastone matching syllable by extrapolating the missing syllable or thedistorted syllable based on stored syllables of a plurality of speechinputs, wherein the at least one matching syllable is determined atleast in part on an expected element value associated with the firstelement; convert the refined second speech input comprising the at leastone matching syllable into text; and provide the text to a user deviceto populate the first element with the text.
 12. The system of claim 11,wherein the irregularity is a first irregularity and the one or moreprocessing circuits are further configured to: determine a secondirregularity that is a non-English language speech input; identify alanguage of the non-English language speech input; and translate thenon-English language speech input into English language.
 13. The systemof claim 11, wherein determining the irregularity in the second speechinput is based on identifying the distorted syllable in the secondspeech input, and the one or more processing circuits are furtherconfigured to: determine that the distorted syllable is due to at leastone of an attenuation of the second speech input, a presence ofbackground noise, or an accent in the second speech input.
 14. Thesystem of claim 13, wherein the one or more processing circuits arefurther configured to: transmit the second speech input to a speechenhancement circuit to at least partially mitigate the irregularity inthe second speech input.
 15. The system of claim 14, wherein the one ormore processing circuits are further configured to: receive a mitigatedspeech output from the speech enhancement circuit as a refinement to atleast partially mitigate the irregularity in the second speech input.16. The system of claim 11, wherein the one or more processing circuitsare further configured to: highlight a second element of the pluralityof elements in response to determining that the first element ispopulated with the text.
 17. One or more non-transitorycomputer-readable storage media having instructions stored thereon that,when executed by one or more processing circuits, cause the one or moreprocessing circuits to perform operations comprising: determining aplurality of elements of a document; receiving a first speech input froma user to enable a mode of operation; authenticating the user bycomparing the first speech input from the user with at least one voicesample of the user; in response to authenticating the first speechinput, enabling the mode of operation; receiving, in the mode ofoperation, a second speech input for filling out a first element of thedocument, wherein the first element is selected based on a priorityorder; determining an irregularity or distortion in the second speechinput based on the first element and identifying a missing syllable or adistorted syllable in the second speech input, wherein identifying themissing syllable or the distorted syllable comprises executing ananalysis of the second speech input, wherein either (1) the missingsyllable is determined based on other syllables identified in theanalysis, or (2) the distorted syllable is determined based on failingto recognize a syllable in the analysis; refining the second speechinput into at least one matching syllable by extrapolating the missingsyllable or the distorted syllable based on stored syllables of aplurality of speech inputs, wherein the at least one matching syllableis determined at least in part on an expected element value associatedwith the first element; converting the refined second speech inputcomprising the at least one matching syllable into text; and providingthe text to a user device to populate the first element with the text.18. The one or more non-transitory computer-readable storage media ofclaim 17, wherein the instructions, when executed by the one or moreprocessing circuits, further cause the one or more processing circuitsto perform operations comprising: highlighting, on a display screen ofthe user device, a second element of the plurality of elements inresponse to determining that the first element is populated with thetext.
 19. The one or more non-transitory computer-readable storage mediaof claim 17, wherein the instructions, when executed by the one or moreprocessing circuits, further cause the one or more processing circuitsto perform operations comprising: determining the expected element valuefor the first element based on metadata; comparing the second speechinput to the expected element value; and determining that the secondspeech input does not match the expected element value of the firstelement.
 20. The one or more non-transitory computer-readable storagemedia of claim 17, wherein the instructions, when executed by the one ormore processing circuits, further cause the one or more processingcircuits to perform operations comprising: correcting an error in thefirst element by disregarding the received second speech input for asecond value of the first element in favor of information that matchesthe expected element value of the first element.